Exploratory Analysis of Prosper Loans

Ly Vinh Hung (Tommy)

In fulfillment of Udacity Data Analyst Nanodegree Project 4

Prosper Loan data provided by Udacity

Abstract

Prosper.com is a P2P lending marketplace, similarly with LendingClub.com, with the notorious IPO in 2014. Prosper Loan data provided by Udacity was last updated 03/11/2014

Initial Dataset Exploration

## [1] "/Users/tommyly/Documents/Udacity/p4 - Explore and Summarize Data with R"

Let’s checkout the initial structure of the dataset

##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"
## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
## [1] 113937
## [1] 81

Univariate Plots Section

##    AA     A     B     C     D     E    HR    NA  NA's 
##  5372 14551 15581 18345 14274  9795  6935     0 29084

The majority of loan are non-classify and among those are being rated, “C” rating loan request are the majority.

##     1     2     3     4     5     6     7     8     9    10    11  NA's 
##   992  5766  7642 12595  9813 12278 10597 12053  6911  4750  1456 29084

Here we see another rating method. Again, most of the loans are not properly rated and rating is pretty concentrated from 4-8 range.

Surprisigly, people with less than 25,000 USD annual income were not the majority of borrowers. I suspect this trend indicate the customer segment Prosper Loan’s going after. It’s possible the main target are future home-owner, car-owner, studebt-debt payment.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

The data has a long-tailed right-skewed but as expected. It’s expected the majority of people in U.S have a credit history, and the ratio should be low enough for a secured repayment. Seems like 25% is the threshold for most borrower.

## Debt Consolidation   Home Improvement           Business 
##              58308               7433               7189 
##      Personal Loan        Student Use               Auto 
##               2395                756               2572 
##    Baby & Adoption               Boat Cosmetic Procedure 
##                199                 85                 91 
##    Engagement Ring        Green Loans Household Expenses 
##                217                 59               1996 
##    Large Purchases     Medical/Dental         Motorcycle 
##                876               1522                304 
##                 RV              Taxes           Vacation 
##                 52                885                768 
##      Wedding Loans              Other      Not Available 
##                771              10494              16965

Again, we see not many people’re willing to explain the purpose of loan listing. There’s a surprisingly amount of needs for debt consolidation, more than 50%. Now the data for income range makes perfect sense, as young people are going out to the real world and start to repay their student debt, purchase cars, mortgage their apartment, etc. Therefore, people already been in the workforce for many decades wouldn’t need much Prosper service.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2005    2008    2012    2011    2013    2014

The chart took a plunge in 2009 and exploded the year after and peaked in 2013. Now, this period is coincide with the 2008 Global Financial Crisis and government must have conducted certain action toward credit-typed business. Doing a quick Google search, we now understand the period of October 15, 2008 to July 13, 2009 is Prosper’s QuietSEC Period, from which they are suspended for lending activities upon SEC approval. Prosper relaunched in July 2009, most notable hypothesis is Prosper needed to adjust their scoring algorigthm to ensure tighter interest rate & stricter rating criteria.

Again, we so almost little to no loan activities in 2009 and the extremely high growth in 2013. Probably in 2013, business went back to normal and the demand for credit increased.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

Here is another interesting insight, the minimum loan amoutn is $1,000 USD and people tend to borrow in 4 main range ($5000, $10000, $15000, $20000). Although this might be more than enough for them to cover their original need, people tend to check these rounded amount boxes.

##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134

Among these criteria, probably people would tick in “Employed” box & would not fill-in other details, which also explained why self-employed borrowers are higher than part-time employment.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750000

Not so many intresting insight here, median income range is at $4667 and mean of $5608 which is quite average for a professional with few years of experience.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   2.167   5.583   8.006  11.420  62.920    7625

As expected, I see the data is in favor of people have fewer years of employment and as they aging, they do not need much loan as before.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

As an outsider from finance industry, it’s interesting to know the median borrower rate. I’d suspect they have different borrower rate for different term & rating but here we get a median of 18.4%. However, there is a huge notable spike around 31%, which is a very attractive rate for the investors. This is something we can keep in mind while looking for further investigation.

This is an interesting piece of information we would want to investigate further. 36-month term loan is the most sought-after term, if we couple this with the previous common Borrower Rate, a 3 years loan with median 18.4% borrower rate is a very attractive rate for this business.

This chart is quite similar with the previous chat for borrower rate. As someone who’s completely new to the industry, I did not fully comprehend there’s a very linear and strong relationship between borrower rate and lender yield, which unfold later on during bivariate analysis.

## 
## FALSE  TRUE 
## 86123 27814

Out of 113937 loans, about 24.4% are funded individually. This is a very high concentration toward 1 investor. However, if we keep in mind 75% of loan are below $12000. This would make sense.

Univariate Analysis

What is the structure of your dataset?

The dataset is comprised of 88 variables with 113937 observations. The loans cover the period 2005 & 2014. Variables are of classes int, numeric, date, and factor. Although 88 variables seems to be intimidating at first, we can split these variables into 2 main players: the “Borrowers” variables & “Investors” variables.

What is/are the main feature(s) of interest in your dataset?

For the sake of this analysis, let’s split the dataset into 2 different players:

For Borrower, I believe the Prosper Score, Proser Rating (both numeric & alphabet) are the main indicators of a quality of borrowers. Further bivariate analysis would help to learn about these 2.

For Lender, I now understand Lender Yield is the most important factor for investor.

As a marketplace, Prosper would be more interest in Service Fees & Collection Fees but we would not dive into these variables in our analysis.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

So far, I have seen some evidences that people take more loans in specific months and the interest rate of these loans are strongly related to the term of the each loan. There is something curious about loan with 37%-38% of interest rate and I have not understood why loan with term of 36 month are so much more dispersed than 60 months, although the median of interest rates on 60 month are slightly greater than 36 months.

Also, I didn’t find nothing so interesting about risk score and income level, two variable that I was expecting more. On the other hand, maybe this variables have more to tell when related to others. I would like to understand better why loans with term of 36-months are more dispersed than 60 months. Also I want to find out why risky loans are classified that way.

Did you create any new variables from existing variables in the dataset?

No. I was factorizing a few variables but did not create any new one.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

Yes, there is few unsual distribution, I was particularly pay attention to the high spike in lender yield & borrower rate and the spike in LoanOriginalAmount that people tend to buy in bulk. I was factorizing data and since categorical variables enter into statistical models differently than continuous variables, storing data as factors insures that the modeling functions will treat such data correctly.

Bivariate Plots Section

As we can see, with better rating and safe borrower, the borrower rate is having an inveser relationship with the prosper rating.

The lower the rank, the more loans get defaulted. The graph might be mistaken as E and HR have less defaulted loans than others, but also keeps in mind loans were less categorized into that area.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        0    38400    56000    67300    81900 21000000

For this graph, we explicitly excluded MonthlyIncome more than 9000 and no income as there’re few outliers have monthly income more than 9000 and the chart cannot displayed properly.

Retired category seems to have higher loan original amount than part-time. This part of data is quite interesting but I don’t really know why such event would happen.

## [1] 0.06291678

As much as I want to believe, although there is some sort of relationship between Debt To Income ratio & Borrower Rate, the relationship is not that significant when we do a correlation test. I also did a few back-of-the-envelope graph and shows no significant relationship between BorrowerRate & other factors.

Similar graph with Borrower Rate, nothing worth notice here.

I initially thought there must be better yield for riskier category but it seems like there’s no relationship here.

This chart is a new interesting insight, although the majority of loan are in 36-month term. The Loan original amount is significantly higher for 60 months term. Let’s see if it’s hold true for Lender Yield

Unfortunately, this is not the case. Term Loan have higher Loan Original Amount because it’d need more time to repay those loans.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

I want to explore 2 main variables Lender Yield and Borrower Rate and see which other related variables have the most effect on it. So far, the only relationship I found is through the proprietory Prosper Scoring system. Other factor I was trying to compare was not having any particular relationship. #### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)? No #### What was the strongest relationship you found? The relationship between (Prosper Rating and Lender Yield) and (Prosper Rating and Borrower Rate) has an inversed relationship. The higher the rating, the lower the borrower rate and lender yield.

Multivariate Plots Section

Here is how we take a closer look at Lender Yield vs Prosper Rating and how Prosper Rating was influced by Debt to Income Ratio.

This is a closer look for lender yield vs prosper rating. The majority of loans opt-in for 36-month term and the return for 36-month and 60-month are just higher than 12-month, also considering the fact there’re less loan in 12-month term than other term.

Prosper must have optimized their model throughout the year and as we see the borrower throughout the year, the variation between borrower rate is not that significant anymore and we tend to have smaller standard deviation year-over-year. Something worth noticing is the amount of borrowing suddenly decreased in 2013.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Term loan is quite a good indicator whether we have a better Lender Yield or not. Also, we see how three variables Lender Yield, Prosper Rating and Debt To Income Ratio come together and how it affect each order. ### Were there any interesting or surprising interactions between features? The criteria for being an AA borrowers seems to be tighten over the year and a there seems to be a fixed borrower rate in criteria HR and AA. ### OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model. No ——

Final Plots and Summary

Plot One:

Description One

This basically best summarize the relationship between Lender Yield and Prosper Rating. The higher the risk, the lower the rating, the better the lender yield. We also noticed high rank like AA would not have D/I ration more than 25% and although most borrowers have lower D/I Ratio, there’re still high D/I ratio borrowers and fall in lower ProsperRating. Therefore, we see a somewhat upward triangular shape.

Plot Two

Description Two

This chart confirmed that although there is higher lending yield for lower rating. The investors should also takes a look at the risk of the loan being defaulted. We see very clear in this chart as the rating get worsen, the higher chance the loan getting defaulted.

Plot Three

Description Three

This graph shows a very interesting time in U.S credit business history where most of the banks and financial institution gone under strong investigation from the SEC, only to recover and went back to business and growein stronger than ever.

Reflection

I’d almost considering switching to another dataset as the more I’m looking at, the more I get confused and see no significant correlation between the variables. I was suspecting to see few obvious trend and correlation without much investigation but I was wrong. Although there are more than 30 plots in this report. Lots of charts were never appeared here as I did lots of back-of-the-envelope visualisation but wouldn’t make any good story. In addition to that, I have to go through each variable one by one, try to understand the meaning as I was not an expert in the financial industry.

I then categorize these variables into three different players: Prosper company, lenders & borrowers. Only then the variables makes a lot more sense & I’m trying to make assumption a long the way to explore it. At the end of the project, it did give me a sense of understanding for the lending business.

As I progress through the analysis, I only found Debt to Income ratio is the predictor of Lending Yield and Borrower Rate and the other variables seems not to play that big of a role. To expand to project, we can try to combine various variables and make a predictive model using logistic regression for a go/no-go investment decision or any classification algorigthm would make sense.